Bank Churn Prediction

Problem Statement¶

Context¶

Businesses like banks which provide service have to worry about problem of 'Customer Churn' i.e. customers leaving and joining another service provider. It is important to understand which aspects of the service influence a customer's decision in this regard. Management can concentrate efforts on improvement of service, keeping in mind these priorities.

Objective¶

You as a Data scientist with the bank need to build a neural network based classifier that can determine whether a customer will leave the bank or not in the next 6 months.

Data Dictionary¶

  • CustomerId: Unique ID which is assigned to each customer

  • Surname: Last name of the customer

  • CreditScore: It defines the credit history of the customer.

  • Geography: A customer’s location

  • Gender: It defines the Gender of the customer

  • Age: Age of the customer

  • Tenure: Number of years for which the customer has been with the bank

  • NumOfProducts: refers to the number of products that a customer has purchased through the bank.

  • Balance: Account balance

  • HasCrCard: It is a categorical variable which decides whether the customer has credit card or not.

  • EstimatedSalary: Estimated salary

  • isActiveMember: Is is a categorical variable which decides whether the customer is active member of the bank or not ( Active member in the sense, using bank products regularly, making transactions etc )

  • Exited : whether or not the customer left the bank within six month. It can take two values 0=No ( Customer did not leave the bank ) 1=Yes ( Customer left the bank )

Importing necessary libraries¶

In [1]:
!pip install tensorflow==2.15.0 scikit-learn==1.2.2 matplotlib===3.7.1 seaborn==0.13.1 numpy==1.25.2 pandas==1.5.3 -q --user
In [90]:
# Library for data manipulation and analysis.
import pandas as pd
# Fundamental package for scientific computing.
import numpy as np
#splitting datasets into training and testing sets.
from sklearn.model_selection import train_test_split
#Imports tools for data preprocessing including label encoding, one-hot encoding, and standard scaling
from sklearn.preprocessing import LabelEncoder, OneHotEncoder,StandardScaler
#Imports a class for imputing missing values in datasets.
from sklearn.impute import SimpleImputer
#Imports the Matplotlib library for creating visualizations.
import matplotlib.pyplot as plt
# Imports the Seaborn library for statistical data visualization.
import seaborn as sns
# Time related functions.
import time
#Imports functions for evaluating the performance of machine learning models
from sklearn.metrics import confusion_matrix, f1_score,accuracy_score, recall_score, precision_score, classification_report

#importing SMOTE
from imblearn.over_sampling import SMOTE

import random

#Imports the tensorflow,keras and layers.
import tensorflow as tf
from tensorflow import keras
from keras import backend
from keras.models import Sequential
from keras.layers import Dense, Dropout


# to suppress unnecessary warnings
import warnings
warnings.filterwarnings("ignore")

Loading the dataset¶

In [3]:
#Read dataset
data = pd.read_csv('Churn.csv')

Data Overview¶

In [4]:
# View the first 5 rows of the data
data.head()
Out[4]:
RowNumber CustomerId Surname CreditScore Geography Gender Age Tenure Balance NumOfProducts HasCrCard IsActiveMember EstimatedSalary Exited
0 1 15634602 Hargrave 619 France Female 42 2 0.00 1 1 1 101348.88 1
1 2 15647311 Hill 608 Spain Female 41 1 83807.86 1 0 1 112542.58 0
2 3 15619304 Onio 502 France Female 42 8 159660.80 3 1 0 113931.57 1
3 4 15701354 Boni 699 France Female 39 1 0.00 2 0 0 93826.63 0
4 5 15737888 Mitchell 850 Spain Female 43 2 125510.82 1 1 1 79084.10 0
In [5]:
# View the last 5 rows of the data
data.tail()
Out[5]:
RowNumber CustomerId Surname CreditScore Geography Gender Age Tenure Balance NumOfProducts HasCrCard IsActiveMember EstimatedSalary Exited
9995 9996 15606229 Obijiaku 771 France Male 39 5 0.00 2 1 0 96270.64 0
9996 9997 15569892 Johnstone 516 France Male 35 10 57369.61 1 1 1 101699.77 0
9997 9998 15584532 Liu 709 France Female 36 7 0.00 1 0 1 42085.58 1
9998 9999 15682355 Sabbatini 772 Germany Male 42 3 75075.31 2 1 0 92888.52 1
9999 10000 15628319 Walker 792 France Female 28 4 130142.79 1 1 0 38190.78 0
In [6]:
# Check number of rows and columns
data.shape
Out[6]:
(10000, 14)
In [7]:
# check the datatypes of the columns in the dataset
data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 14 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   RowNumber        10000 non-null  int64  
 1   CustomerId       10000 non-null  int64  
 2   Surname          10000 non-null  object 
 3   CreditScore      10000 non-null  int64  
 4   Geography        10000 non-null  object 
 5   Gender           10000 non-null  object 
 6   Age              10000 non-null  int64  
 7   Tenure           10000 non-null  int64  
 8   Balance          10000 non-null  float64
 9   NumOfProducts    10000 non-null  int64  
 10  HasCrCard        10000 non-null  int64  
 11  IsActiveMember   10000 non-null  int64  
 12  EstimatedSalary  10000 non-null  float64
 13  Exited           10000 non-null  int64  
dtypes: float64(2), int64(9), object(3)
memory usage: 1.1+ MB

Check for duplicate values¶

In [8]:
# check for duplicate values in the data
data.duplicated().sum()
Out[8]:
0
In [9]:
# check for missing values in the data
round(data.isnull().sum() / data.isnull().count() * 100, 2)
Out[9]:
RowNumber          0.0
CustomerId         0.0
Surname            0.0
CreditScore        0.0
Geography          0.0
Gender             0.0
Age                0.0
Tenure             0.0
Balance            0.0
NumOfProducts      0.0
HasCrCard          0.0
IsActiveMember     0.0
EstimatedSalary    0.0
Exited             0.0
dtype: float64
In [10]:
data["Exited"].value_counts(1)
Out[10]:
0    0.7963
1    0.2037
Name: Exited, dtype: float64
In [11]:
# statistical summary of the numerical columns in the data
data.describe().T
Out[11]:
count mean std min 25% 50% 75% max
RowNumber 10000.0 5.000500e+03 2886.895680 1.00 2500.75 5.000500e+03 7.500250e+03 10000.00
CustomerId 10000.0 1.569094e+07 71936.186123 15565701.00 15628528.25 1.569074e+07 1.575323e+07 15815690.00
CreditScore 10000.0 6.505288e+02 96.653299 350.00 584.00 6.520000e+02 7.180000e+02 850.00
Age 10000.0 3.892180e+01 10.487806 18.00 32.00 3.700000e+01 4.400000e+01 92.00
Tenure 10000.0 5.012800e+00 2.892174 0.00 3.00 5.000000e+00 7.000000e+00 10.00
Balance 10000.0 7.648589e+04 62397.405202 0.00 0.00 9.719854e+04 1.276442e+05 250898.09
NumOfProducts 10000.0 1.530200e+00 0.581654 1.00 1.00 1.000000e+00 2.000000e+00 4.00
HasCrCard 10000.0 7.055000e-01 0.455840 0.00 0.00 1.000000e+00 1.000000e+00 1.00
IsActiveMember 10000.0 5.151000e-01 0.499797 0.00 0.00 1.000000e+00 1.000000e+00 1.00
EstimatedSalary 10000.0 1.000902e+05 57510.492818 11.58 51002.11 1.001939e+05 1.493882e+05 199992.48
Exited 10000.0 2.037000e-01 0.402769 0.00 0.00 0.000000e+00 0.000000e+00 1.00
In [12]:
# Check the number of unique values in each column
data.nunique()
Out[12]:
RowNumber          10000
CustomerId         10000
Surname             2932
CreditScore          460
Geography              3
Gender                 2
Age                   70
Tenure                11
Balance             6382
NumOfProducts          4
HasCrCard              2
IsActiveMember         2
EstimatedSalary     9999
Exited                 2
dtype: int64
In [13]:
for i in data.describe(include=["object"]).columns:
    print("Unique values in", i, "are :")
    print(data[i].value_counts())
    print("*" * 50)
Unique values in Surname are :
Smith        32
Martin       29
Scott        29
Walker       28
Brown        26
             ..
Wells         1
Calzada       1
Gresswell     1
Aguirre       1
Morales       1
Name: Surname, Length: 2932, dtype: int64
**************************************************
Unique values in Geography are :
France     5014
Germany    2509
Spain      2477
Name: Geography, dtype: int64
**************************************************
Unique values in Gender are :
Male      5457
Female    4543
Name: Gender, dtype: int64
**************************************************

Exploratory Data Analysis¶

In [14]:
# function to plot a boxplot and a histogram along the same scale.


def histogram_boxplot(data, feature, figsize=(12, 7), kde=False, bins=None):
    """
    Boxplot and histogram combined

    data: dataframe
    feature: dataframe column
    figsize: size of figure (default (12,7))
    kde: whether to the show density curve (default False)
    bins: number of bins for histogram (default None)
    """
    f2, (ax_box2, ax_hist2) = plt.subplots(
        nrows=2,  # Number of rows of the subplot grid= 2
        sharex=True,  # x-axis will be shared among all subplots
        gridspec_kw={"height_ratios": (0.25, 0.75)},
        figsize=figsize,
    )  # creating the 2 subplots
    sns.boxplot(
        data=data, x=feature, ax=ax_box2, showmeans=True, color="violet"
    )  # boxplot will be created and a triangle will indicate the mean value of the column
    sns.histplot(
        data=data, x=feature, kde=kde, ax=ax_hist2, bins=bins, palette="winter"
    ) if bins else sns.histplot(
        data=data, x=feature, kde=kde, ax=ax_hist2
    )  # For histogram
    ax_hist2.axvline(
        data[feature].mean(), color="green", linestyle="--"
    )  # Add mean to the histogram
    ax_hist2.axvline(
        data[feature].median(), color="black", linestyle="-"
    )  # Add median to the histogram
In [15]:
# function to create labeled barplots


def labeled_barplot(data, feature, perc=False, n=None):
    """
    Barplot with percentage at the top

    data: dataframe
    feature: dataframe column
    perc: whether to display percentages instead of count (default is False)
    n: displays the top n category levels (default is None, i.e., display all levels)
    """

    total = len(data[feature])  # length of the column
    count = data[feature].nunique()
    if n is None:
        plt.figure(figsize=(count + 1, 5))
    else:
        plt.figure(figsize=(n + 1, 5))

    plt.xticks(rotation=90, fontsize=15)
    ax = sns.countplot(
        data=data,
        x=feature,
        palette="Paired",
        order=data[feature].value_counts().index[:n].sort_values(),
    )

    for p in ax.patches:
        if perc == True:
            label = "{:.1f}%".format(
                100 * p.get_height() / total
            )  # percentage of each class of the category
        else:
            label = p.get_height()  # count of each level of the category

        x = p.get_x() + p.get_width() / 2  # width of the plot
        y = p.get_height()  # height of the plot

        ax.annotate(
            label,
            (x, y),
            ha="center",
            va="center",
            size=12,
            xytext=(0, 5),
            textcoords="offset points",
        )  # annotate the percentage

    plt.show()  # show the plot
In [16]:
# function to plot stacked bar chart

def stacked_barplot(data, predictor, target):
    """
    Print the category counts and plot a stacked bar chart

    data: dataframe
    predictor: independent variable
    target: target variable
    """
    count = data[predictor].nunique()
    sorter = data[target].value_counts().index[-1]
    tab1 = pd.crosstab(data[predictor], data[target], margins=True).sort_values(
        by=sorter, ascending=False
    )
    print(tab1)
    print("-" * 120)
    tab = pd.crosstab(data[predictor], data[target], normalize="index").sort_values(
        by=sorter, ascending=False
    )
    tab.plot(kind="bar", stacked=True, figsize=(count + 1, 5))
    plt.legend(
        loc="lower left", frameon=False,
    )
    plt.legend(loc="upper left", bbox_to_anchor=(1, 1))
    plt.show()
In [17]:
### Function to plot distributions

def distribution_plot_wrt_target(data, predictor, target):

    fig, axs = plt.subplots(2, 2, figsize=(12, 10))

    target_uniq = data[target].unique()

    axs[0, 0].set_title("Distribution of target for target=" + str(target_uniq[0]))
    sns.histplot(
        data=data[data[target] == target_uniq[0]],
        x=predictor,
        kde=True,
        ax=axs[0, 0],
        color="teal",
    )

    axs[0, 1].set_title("Distribution of target for target=" + str(target_uniq[1]))
    sns.histplot(
        data=data[data[target] == target_uniq[1]],
        x=predictor,
        kde=True,
        ax=axs[0, 1],
        color="orange",
    )

    axs[1, 0].set_title("Boxplot w.r.t target")
    sns.boxplot(data=data, x=target, y=predictor, ax=axs[1, 0], palette="gist_rainbow")

    axs[1, 1].set_title("Boxplot (without outliers) w.r.t target")
    sns.boxplot(
        data=data,
        x=target,
        y=predictor,
        ax=axs[1, 1],
        showfliers=False,
        palette="gist_rainbow",
    )

    plt.tight_layout()
    plt.show()

Univariate Analysis¶

In [18]:
num_col_sel = data.select_dtypes(include=np.number).columns.tolist()

for item in num_col_sel:
    histogram_boxplot(data, item)

Median estimated salary is 100000.

CustomerId¶

In [19]:
data['CustomerId'].nunique()
Out[19]:
10000

CustomerId is unique for each customer.

Surname¶

In [20]:
data['Surname'].nunique()
Out[20]:
2932
In [21]:
data['Surname'].value_counts()
Out[21]:
Smith        32
Martin       29
Scott        29
Walker       28
Brown        26
             ..
Wells         1
Calzada       1
Gresswell     1
Aguirre       1
Morales       1
Name: Surname, Length: 2932, dtype: int64

32 customers had surname "Smith".

CreditScore¶

In [22]:
data['CreditScore'].nunique()
Out[22]:
460
In [23]:
data['CreditScore'].value_counts()
Out[23]:
850    233
678     63
655     54
705     53
667     53
      ... 
351      1
365      1
382      1
373      1
419      1
Name: CreditScore, Length: 460, dtype: int64
In [24]:
sns.boxplot(data=data,x='CreditScore')
#Boxplot to show the distribution of Customer_Age
Out[24]:
<Axes: xlabel='CreditScore'>

Credit score has many outliers on the lower end but the mean falls around 650.

Geography¶

In [25]:
data['Geography'].nunique()
Out[25]:
3
In [26]:
data['Geography'].value_counts()
Out[26]:
France     5014
Germany    2509
Spain      2477
Name: Geography, dtype: int64
In [27]:
labeled_barplot(data,'Geography', perc=True)

50.1% customers are from France. The three countries that customers belong to are france, germany, and spain.

Gender¶

In [28]:
data['Gender'].nunique()
Out[28]:
2
In [29]:
data['Gender'].value_counts()
Out[29]:
Male      5457
Female    4543
Name: Gender, dtype: int64
In [30]:
labeled_barplot(data,'Gender', perc=True)
Observation¶

45.4% of the clients are female.

54.6% of the clients are male.

Age¶

In [31]:
data['Age'].nunique()
Out[31]:
70
In [32]:
data['Age'].value_counts()
Out[32]:
37    478
38    477
35    474
36    456
34    447
     ... 
84      2
88      1
82      1
85      1
83      1
Name: Age, Length: 70, dtype: int64

The mode for Customer Age is 44

In [33]:
sns.boxplot(data=data,x='Age')
#Boxplot to show the distribution of Customer_Age
Out[33]:
<Axes: xlabel='Age'>

Tenure¶

In [34]:
data['Tenure'].nunique()
Out[34]:
11
In [35]:
data['Tenure'].value_counts()
Out[35]:
2     1048
1     1035
7     1028
8     1025
5     1012
3     1009
4      989
9      984
6      967
10     490
0      413
Name: Tenure, dtype: int64
In [36]:
sns.boxplot(data=data,x='Tenure')
#Boxplot to show the distribution of Customer_Age
Out[36]:
<Axes: xlabel='Tenure'>

50 % of the customers have stayed for a tenure of 3 to 7 years.

NumOfProducts¶

In [37]:
data['NumOfProducts'].nunique()
Out[37]:
4
In [38]:
data['NumOfProducts'].value_counts()
Out[38]:
1    5084
2    4590
3     266
4      60
Name: NumOfProducts, dtype: int64
In [39]:
sns.boxplot(data=data,x='NumOfProducts')
#Boxplot to show the distribution of Total_Trans_Ct
Out[39]:
<Axes: xlabel='NumOfProducts'>

75% have 2 products or less.

Balance¶

In [40]:
data['Balance'].nunique()
Out[40]:
6382
In [41]:
data['Balance'].value_counts()
Out[41]:
0.00         3617
130170.82       2
105473.74       2
159397.75       1
144238.70       1
             ... 
108698.96       1
238387.56       1
111833.47       1
126619.27       1
138734.94       1
Name: Balance, Length: 6382, dtype: int64
In [42]:
sns.boxplot(data=data,x='Balance')
#Boxplot to show the distribution of Total_Trans_Ct
Out[42]:
<Axes: xlabel='Balance'>

HasCrCard¶

In [43]:
data['HasCrCard'].nunique()
Out[43]:
2
In [44]:
data['HasCrCard'].value_counts()
Out[44]:
1    7055
0    2945
Name: HasCrCard, dtype: int64
In [45]:
labeled_barplot(data,'HasCrCard', perc=True)

Over 70 % have a credit card.

EstimatedSalary¶

In [46]:
data['EstimatedSalary'].nunique()
Out[46]:
9999
In [47]:
data['EstimatedSalary'].value_counts()
Out[47]:
24924.92     2
121505.61    1
89874.82     1
72500.68     1
182692.80    1
            ..
188377.21    1
55902.93     1
4523.74      1
102195.16    1
2465.80      1
Name: EstimatedSalary, Length: 9999, dtype: int64
In [48]:
sns.boxplot(data=data,x='EstimatedSalary')
#Boxplot to show the distribution of Total_Trans_Ct
Out[48]:
<Axes: xlabel='EstimatedSalary'>

50% of the customers have a salary of a 100k or more.

IsActiveMember¶

In [49]:
data['IsActiveMember'].nunique()
Out[49]:
2
In [50]:
data['IsActiveMember'].value_counts()
Out[50]:
1    5151
0    4849
Name: IsActiveMember, dtype: int64
In [51]:
labeled_barplot(data,'IsActiveMember', perc=True)

48.5% of the members are not active.

Exited¶

In [52]:
data['Exited'].nunique()
Out[52]:
2
In [53]:
data['Exited'].value_counts()
Out[53]:
0    7963
1    2037
Name: Exited, dtype: int64
In [54]:
labeled_barplot(data,'Exited', perc=True)

20.4% of the customers have exited.

Multivariante Analysis¶

In [55]:
num_col = data.select_dtypes(include=np.number).columns.tolist()
plt.figure(figsize=(15, 7))
sns.heatmap(data[num_col].corr(), annot=True, vmin=-1, vmax=1, fmt=".2f", cmap="Spectral")
plt.show()

Age shows a relatively large correlance with Exited. IsActiveMember showed a negative correlance with Exited. Balance has a posible correlance to exited.

In [56]:
sns.pairplot(data=data[num_col], diag_kind="kde")
plt.show()

Bivariate Analysis¶

Exited vs Surname¶
In [57]:
data['Surname'].value_counts(5)
Out[57]:
Smith        0.0032
Martin       0.0029
Scott        0.0029
Walker       0.0028
Brown        0.0026
              ...  
Wells        0.0001
Calzada      0.0001
Gresswell    0.0001
Aguirre      0.0001
Morales      0.0001
Name: Surname, Length: 2932, dtype: float64
Exited vs CreditScore¶
In [58]:
distribution_plot_wrt_target(data, "CreditScore", "Exited")

There are more outliers with credit score below 400 for customers that left the bank.

Exited vs Geography¶

In [59]:
stacked_barplot(data, "Geography", "Exited")
Exited        0     1    All
Geography                   
All        7963  2037  10000
Germany    1695   814   2509
France     4204   810   5014
Spain      2064   413   2477
------------------------------------------------------------------------------------------------------------------------

Although the largest customer base is in France, the highest number of customers who left were based in Germany.

Exited vs Gender¶

In [60]:
stacked_barplot(data, "Gender", "Exited")
Exited     0     1    All
Gender                   
All     7963  2037  10000
Female  3404  1139   4543
Male    4559   898   5457
------------------------------------------------------------------------------------------------------------------------

Higher percentage of females exited the bank.

Exited vs Age¶

In [61]:
distribution_plot_wrt_target(data, "Age", "Exited")

50% of the customers who exited are of age 45 or older, while more than 75% of the customers who did not exit are younger than 45.

Exited vs Tenure¶

In [62]:
distribution_plot_wrt_target(data, "Tenure", "Exited")

Exited vs NumOfProducts¶

In [63]:
stacked_barplot(data, "NumOfProducts", "Exited")
Exited            0     1    All
NumOfProducts                   
All            7963  2037  10000
1              3675  1409   5084
2              4242   348   4590
3                46   220    266
4                 0    60     60
------------------------------------------------------------------------------------------------------------------------

Customers with

Exited vs Balance¶

In [64]:
distribution_plot_wrt_target(data, "Balance", "Exited")

Exited vs HasCrCard¶

In [65]:
stacked_barplot(data, "HasCrCard", "Exited")
Exited        0     1    All
HasCrCard                   
All        7963  2037  10000
1          5631  1424   7055
0          2332   613   2945
------------------------------------------------------------------------------------------------------------------------

Exited vs EstimatedSalary¶

In [66]:
distribution_plot_wrt_target(data, "EstimatedSalary", "Exited")

Exited vs IsActiveMember¶

In [67]:
stacked_barplot(data, "IsActiveMember", "Exited")
Exited             0     1    All
IsActiveMember                   
All             7963  2037  10000
0               3547  1302   4849
1               4416   735   5151
------------------------------------------------------------------------------------------------------------------------

Data Preprocessing¶

In [68]:
#Drop column "CustomerId as it is unique and will not add value to the modeling
data.drop(['CustomerId'], axis=1, inplace=True)

Dummy Variable Creation¶

In [69]:
#Separate target and dependent Columns
X = data.drop(['Exited'],axis=1)
Y = data['Exited']
In [70]:
X.columns
Out[70]:
Index(['RowNumber', 'Surname', 'CreditScore', 'Geography', 'Gender', 'Age',
       'Tenure', 'Balance', 'NumOfProducts', 'HasCrCard', 'IsActiveMember',
       'EstimatedSalary'],
      dtype='object')

Missing value treatment¶

In [71]:
#Calculate the total number of null values for each columns.
X.isnull().sum()
Out[71]:
RowNumber          0
Surname            0
CreditScore        0
Geography          0
Gender             0
Age                0
Tenure             0
Balance            0
NumOfProducts      0
HasCrCard          0
IsActiveMember     0
EstimatedSalary    0
dtype: int64

There are no missing values.

In [72]:
#Encode categorical variables using one-hot encoding
X = pd.get_dummies(
    X,
    columns=X.select_dtypes(include=["object"]).columns.tolist(),
    drop_first=True, dtype= float
)

Train-validation-test Split¶

In [73]:
#Split dataset into the Training set and Test set.
X_train, X_test, y_train, y_test = train_test_split(X,Y, test_size = 0.2, random_state = 42,stratify = Y)
In [74]:
# Split Train dataset into the Training and Validation sets.
X_train, X_valid, y_train, y_valid = train_test_split(X_train,y_train, test_size = 0.2, random_state = 42,stratify = y_train)
In [75]:
print("Number of rows in train data =", X_train.shape[0])
print("Number of rows in validation data =", X_valid.shape[0])
print("Number of rows in test data =", X_test.shape[0])
Number of rows in train data = 6400
Number of rows in validation data = 1600
Number of rows in test data = 2000
In [76]:
print("Number of rows in train data =", y_train.shape[0])
print("Number of rows in validation data =", y_valid.shape[0])
print("Number of rows in test data =", y_test.shape[0])
Number of rows in train data = 6400
Number of rows in validation data = 1600
Number of rows in test data = 2000

Data Normalization¶

In [77]:
#Standardize the numerical variables
num_col = X.select_dtypes(include=np.number).columns.tolist()
transformer = StandardScaler()
X_train[num_col] = transformer.fit_transform(X_train[num_col])
X_valid[num_col] = transformer.fit_transform(X_valid[num_col])
X_test[num_col] = transformer.fit_transform(X_test[num_col])

Model Building¶

Model Evaluation Criterion¶

Write down the logic for choosing the metric that would be the best metric for this business scenario.

-

Recall would be the best metric for this situation because of it's ability to give adequate importance to minority class. Since the group of exited class is a minority, recall is a better option. Also high recall makes sure that positive cases are not ignored. Therefore recall is the best metric for this business scenario.

In [78]:
# defining a function to compute different metrics to check performance of a classification model built using statsmodels
def model_performance_classification(
    model, predictors, target, threshold=0.5
):
    """
    Function to compute different metrics to check classification model performance

    model: classifier
    predictors: independent variables
    target: dependent variable
    threshold: threshold for classifying the observation as class 1
    """

    # checking which probabilities are greater than threshold
    pred = model.predict(predictors) > threshold
    # pred_temp = model.predict(predictors) > threshold
    # # rounding off the above values to get classes
    # pred = np.round(pred_temp)

    acc = accuracy_score(target, pred)  # to compute Accuracy
    recall = recall_score(target, pred, average='weighted')  # to compute Recall
    precision = precision_score(target, pred, average='weighted')  # to compute Precision
    f1 = f1_score(target, pred, average='weighted')  # to compute F1-score

    # creating a dataframe of metrics
    df_perf = pd.DataFrame(
        {"Accuracy": acc, "Recall": recall, "Precision": precision, "F1 Score": f1,},
        index=[0],
    )

    return df_perf
In [79]:
def plot(history, name):
    """
    Function to plot loss/accuracy

    history: an object which stores the metrics and losses.
    name: can be one of Loss or Accuracy
    """
    fig, ax = plt.subplots() #Creating a subplot with figure and axes.
    plt.plot(history.history[name]) #Plotting the train accuracy or train loss
    plt.plot(history.history['val_'+name]) #Plotting the validation accuracy or validation loss

    plt.title('Model ' + name.capitalize()) #Defining the title of the plot.
    plt.ylabel(name.capitalize()) #Capitalizing the first letter.
    plt.xlabel('Epoch') #Defining the label for the x-axis.
    fig.legend(['Train', 'Validation'], loc="outside right upper") #Defining the legend, loc controls the position of the legend.
In [80]:
def make_confusion_matrix(actual_targets, predicted_targets):
    """
    To plot the confusion_matrix with percentages

    actual_targets: actual target (dependent) variable values
    predicted_targets: predicted target (dependent) variable values
    """
    cm = confusion_matrix(actual_targets, predicted_targets)
    labels = np.asarray(
        [
            ["{0:0.0f}".format(item) + "\n{0:.2%}".format(item / cm.flatten().sum())]
            for item in cm.flatten()
        ]
    ).reshape(cm.shape[0], cm.shape[1])

    plt.figure(figsize=(6, 4))
    sns.heatmap(cm, annot=labels, fmt="")
    plt.ylabel("True label")
    plt.xlabel("Predicted label")
In [81]:
train_data=pd.DataFrame(columns=["recall"])
valid_data=pd.DataFrame(columns=["recall"])

Neural Network with SGD Optimizer¶

In [82]:
#*** Calculate class weights for imbalanced dataset
cw = (y_train.shape[0]) / np.bincount(y_train)

# Create a dictionary mapping class indices to their respective class weights
cw_dict = {}
for i in range(cw.shape[0]):
    cw_dict[i] = cw[i]

cw_dict
Out[82]:
{0: 1.2558869701726845, 1: 4.9079754601226995}
In [83]:
# defining the batch size and # epochs upfront as we'll be using the same values for all models
epochs = 25
batch_size = 65
In [84]:
backend.clear_session()
np.random.seed(2)
random.seed(2)
tf.random.set_seed(2)
In [85]:
#Initializing the neural network
model0 = Sequential()
model0.add(Dense(70,activation="relu",input_dim=X_train.shape[1]))
model0.add(Dense(17,activation="relu"))
model0.add(Dense(1,activation="sigmoid"))
In [86]:
model0.summary()
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense (Dense)               (None, 70)                206080    
                                                                 
 dense_1 (Dense)             (None, 17)                1207      
                                                                 
 dense_2 (Dense)             (None, 1)                 18        
                                                                 
=================================================================
Total params: 207305 (809.79 KB)
Trainable params: 207305 (809.79 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
In [89]:
 
In [92]:
optimizer = tf.keras.optimizers.SGD(0.001)    # defining SGD as the optimizer to be used
metric = keras.metrics.Recall()
model0.compile(loss='binary_crossentropy', optimizer=optimizer,metrics=['Recall'])
In [93]:
start = time.time()
history = model0.fit(X_train, y_train, validation_data=(X_valid,y_valid) , batch_size=batch_size, epochs=epochs,class_weight=cw_dict)
end=time.time()
Epoch 1/25
99/99 [==============================] - 6s 14ms/step - loss: 1.5906 - recall: 0.7400 - val_loss: 0.7870 - val_recall: 0.6779
Epoch 2/25
99/99 [==============================] - 0s 5ms/step - loss: 1.4873 - recall: 0.6426 - val_loss: 0.7406 - val_recall: 0.5613
Epoch 3/25
99/99 [==============================] - 1s 5ms/step - loss: 1.4291 - recall: 0.5890 - val_loss: 0.7214 - val_recall: 0.4969
Epoch 4/25
99/99 [==============================] - 0s 5ms/step - loss: 1.3840 - recall: 0.6043 - val_loss: 0.7078 - val_recall: 0.4877
Epoch 5/25
99/99 [==============================] - 1s 5ms/step - loss: 1.3461 - recall: 0.6288 - val_loss: 0.6961 - val_recall: 0.4663
Epoch 6/25
99/99 [==============================] - 0s 5ms/step - loss: 1.3133 - recall: 0.6327 - val_loss: 0.6903 - val_recall: 0.4509
Epoch 7/25
99/99 [==============================] - 0s 5ms/step - loss: 1.2837 - recall: 0.6511 - val_loss: 0.6855 - val_recall: 0.4356
Epoch 8/25
99/99 [==============================] - 0s 5ms/step - loss: 1.2565 - recall: 0.6741 - val_loss: 0.6790 - val_recall: 0.4294
Epoch 9/25
99/99 [==============================] - 0s 5ms/step - loss: 1.2313 - recall: 0.6848 - val_loss: 0.6737 - val_recall: 0.4110
Epoch 10/25
99/99 [==============================] - 0s 5ms/step - loss: 1.2074 - recall: 0.6856 - val_loss: 0.6724 - val_recall: 0.4294
Epoch 11/25
99/99 [==============================] - 0s 5ms/step - loss: 1.1848 - recall: 0.7040 - val_loss: 0.6680 - val_recall: 0.4294
Epoch 12/25
99/99 [==============================] - 0s 5ms/step - loss: 1.1630 - recall: 0.7178 - val_loss: 0.6637 - val_recall: 0.4141
Epoch 13/25
99/99 [==============================] - 0s 5ms/step - loss: 1.1417 - recall: 0.7301 - val_loss: 0.6585 - val_recall: 0.3988
Epoch 14/25
99/99 [==============================] - 1s 7ms/step - loss: 1.1214 - recall: 0.7377 - val_loss: 0.6538 - val_recall: 0.3834
Epoch 15/25
99/99 [==============================] - 1s 6ms/step - loss: 1.1015 - recall: 0.7393 - val_loss: 0.6506 - val_recall: 0.3773
Epoch 16/25
99/99 [==============================] - 1s 7ms/step - loss: 1.0821 - recall: 0.7508 - val_loss: 0.6476 - val_recall: 0.3773
Epoch 17/25
99/99 [==============================] - 1s 7ms/step - loss: 1.0631 - recall: 0.7577 - val_loss: 0.6459 - val_recall: 0.3834
Epoch 18/25
99/99 [==============================] - 1s 7ms/step - loss: 1.0447 - recall: 0.7661 - val_loss: 0.6430 - val_recall: 0.3865
Epoch 19/25
99/99 [==============================] - 0s 5ms/step - loss: 1.0266 - recall: 0.7730 - val_loss: 0.6398 - val_recall: 0.3896
Epoch 20/25
99/99 [==============================] - 0s 5ms/step - loss: 1.0091 - recall: 0.7738 - val_loss: 0.6383 - val_recall: 0.3957
Epoch 21/25
99/99 [==============================] - 0s 5ms/step - loss: 0.9921 - recall: 0.7807 - val_loss: 0.6370 - val_recall: 0.3926
Epoch 22/25
99/99 [==============================] - 0s 4ms/step - loss: 0.9754 - recall: 0.7899 - val_loss: 0.6317 - val_recall: 0.3896
Epoch 23/25
99/99 [==============================] - 0s 5ms/step - loss: 0.9594 - recall: 0.7899 - val_loss: 0.6309 - val_recall: 0.3896
Epoch 24/25
99/99 [==============================] - 0s 5ms/step - loss: 0.9435 - recall: 0.7991 - val_loss: 0.6284 - val_recall: 0.3865
Epoch 25/25
99/99 [==============================] - 0s 5ms/step - loss: 0.9284 - recall: 0.8029 - val_loss: 0.6257 - val_recall: 0.3865
In [94]:
print("Time taken in seconds ",end-start)
Time taken in seconds  19.47603940963745
In [95]:
plot(history,'loss')
In [96]:
y_train_pred = model0.predict(X_train)
y_train_pred = (y_train_pred > 0.5)
y_train_pred
200/200 [==============================] - 1s 3ms/step
Out[96]:
array([[ True],
       [False],
       [False],
       ...,
       [False],
       [ True],
       [False]])
In [97]:
y_valid_pred = model0.predict(X_valid)
y_valid_pred = (y_valid_pred > 0.5)
y_valid_pred
50/50 [==============================] - 0s 5ms/step
Out[97]:
array([[False],
       [False],
       [False],
       ...,
       [ True],
       [ True],
       [ True]])
In [98]:
cl_rp = classification_report(y_train, y_train_pred)
print(cl_rp)
              precision    recall  f1-score   support

           0       0.94      0.79      0.86      5096
           1       0.50      0.82      0.62      1304

    accuracy                           0.79      6400
   macro avg       0.72      0.80      0.74      6400
weighted avg       0.85      0.79      0.81      6400

In [99]:
cl_rp = classification_report(y_valid, y_valid_pred)
print(cl_rp)
              precision    recall  f1-score   support

           0       0.81      0.69      0.75      1274
           1       0.24      0.39      0.30       326

    accuracy                           0.63      1600
   macro avg       0.53      0.54      0.52      1600
weighted avg       0.70      0.63      0.66      1600

In [100]:
modelName="NN SGD"
train_data.loc[modelName] = recall_score(y_train, y_train_pred)
valid_data.loc[modelName] = recall_score(y_valid, y_valid_pred)
In [101]:
make_confusion_matrix(y_train, y_train_pred)
In [102]:
make_confusion_matrix(y_valid, y_valid_pred)

Recall score for both training set and validation set can be improved. Possibly an underfitted model.

Model Performance Improvement¶

Neural Network with Adam Optimizer¶

In [103]:
backend.clear_session()
np.random.seed(2)
random.seed(2)
tf.random.set_seed(2)
In [104]:
#Initializing the neural network
model = Sequential()
model.add(Dense(70,activation="relu",input_dim=X_train.shape[1]))
model.add(Dense(17,activation="relu"))
model.add(Dense(1,activation="sigmoid"))
In [105]:
model.summary()
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense (Dense)               (None, 70)                206080    
                                                                 
 dense_1 (Dense)             (None, 17)                1207      
                                                                 
 dense_2 (Dense)             (None, 1)                 18        
                                                                 
=================================================================
Total params: 207305 (809.79 KB)
Trainable params: 207305 (809.79 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
In [107]:
optimizer = tf.keras.optimizers.Adam()    # defining SGD as the optimizer to be used
metric = keras.metrics.Recall()
model.compile(loss='binary_crossentropy', optimizer=optimizer,metrics=['Recall'])
In [108]:
start = time.time()
history = model.fit(X_train, y_train, validation_data=(X_valid,y_valid) , batch_size=batch_size, epochs=epochs,class_weight=cw_dict)
end=time.time()
Epoch 1/25
99/99 [==============================] - 2s 8ms/step - loss: 1.5271 - recall: 0.5023 - val_loss: 0.7103 - val_recall: 0.6350
Epoch 2/25
99/99 [==============================] - 1s 7ms/step - loss: 1.0437 - recall: 0.7945 - val_loss: 0.6428 - val_recall: 0.4785
Epoch 3/25
99/99 [==============================] - 1s 7ms/step - loss: 0.8643 - recall: 0.8090 - val_loss: 0.6878 - val_recall: 0.5368
Epoch 4/25
99/99 [==============================] - 1s 8ms/step - loss: 0.7701 - recall: 0.8459 - val_loss: 0.6950 - val_recall: 0.5613
Epoch 5/25
99/99 [==============================] - 1s 8ms/step - loss: 0.7176 - recall: 0.8597 - val_loss: 0.6614 - val_recall: 0.5368
Epoch 6/25
99/99 [==============================] - 1s 9ms/step - loss: 0.6704 - recall: 0.8612 - val_loss: 0.6736 - val_recall: 0.5460
Epoch 7/25
99/99 [==============================] - 1s 8ms/step - loss: 0.6408 - recall: 0.8873 - val_loss: 0.6704 - val_recall: 0.5368
Epoch 8/25
99/99 [==============================] - 1s 7ms/step - loss: 0.6012 - recall: 0.8934 - val_loss: 0.7028 - val_recall: 0.5798
Epoch 9/25
99/99 [==============================] - 1s 7ms/step - loss: 0.5700 - recall: 0.9087 - val_loss: 0.7018 - val_recall: 0.5644
Epoch 10/25
99/99 [==============================] - 0s 5ms/step - loss: 0.5349 - recall: 0.9149 - val_loss: 0.7486 - val_recall: 0.6012
Epoch 11/25
99/99 [==============================] - 0s 5ms/step - loss: 0.5027 - recall: 0.9241 - val_loss: 0.7366 - val_recall: 0.5920
Epoch 12/25
99/99 [==============================] - 0s 5ms/step - loss: 0.4707 - recall: 0.9394 - val_loss: 0.7670 - val_recall: 0.5982
Epoch 13/25
99/99 [==============================] - 0s 5ms/step - loss: 0.4320 - recall: 0.9517 - val_loss: 0.7595 - val_recall: 0.5890
Epoch 14/25
99/99 [==============================] - 0s 5ms/step - loss: 0.3945 - recall: 0.9555 - val_loss: 0.7474 - val_recall: 0.5706
Epoch 15/25
99/99 [==============================] - 0s 5ms/step - loss: 0.3568 - recall: 0.9578 - val_loss: 0.7983 - val_recall: 0.5828
Epoch 16/25
99/99 [==============================] - 0s 5ms/step - loss: 0.3180 - recall: 0.9663 - val_loss: 0.8367 - val_recall: 0.5920
Epoch 17/25
99/99 [==============================] - 0s 5ms/step - loss: 0.2817 - recall: 0.9747 - val_loss: 0.8649 - val_recall: 0.5828
Epoch 18/25
99/99 [==============================] - 0s 5ms/step - loss: 0.2574 - recall: 0.9709 - val_loss: 0.8549 - val_recall: 0.5706
Epoch 19/25
99/99 [==============================] - 0s 5ms/step - loss: 0.2206 - recall: 0.9793 - val_loss: 0.9081 - val_recall: 0.5675
Epoch 20/25
99/99 [==============================] - 1s 9ms/step - loss: 0.1857 - recall: 0.9839 - val_loss: 0.9439 - val_recall: 0.5644
Epoch 21/25
99/99 [==============================] - 1s 10ms/step - loss: 0.1655 - recall: 0.9893 - val_loss: 1.0050 - val_recall: 0.5583
Epoch 22/25
99/99 [==============================] - 1s 9ms/step - loss: 0.1472 - recall: 0.9854 - val_loss: 0.9935 - val_recall: 0.5399
Epoch 23/25
99/99 [==============================] - 1s 12ms/step - loss: 0.1254 - recall: 0.9931 - val_loss: 1.0196 - val_recall: 0.5399
Epoch 24/25
99/99 [==============================] - 1s 14ms/step - loss: 0.1057 - recall: 0.9939 - val_loss: 1.0286 - val_recall: 0.5245
Epoch 25/25
99/99 [==============================] - 1s 11ms/step - loss: 0.0906 - recall: 0.9954 - val_loss: 1.0719 - val_recall: 0.5337
In [109]:
print("Time taken in seconds ",end-start)
Time taken in seconds  20.29223370552063
In [110]:
plot(history,'loss')
In [111]:
y_train_pred = model.predict(X_train)
y_train_pred = (y_train_pred > 0.5)
y_train_pred
200/200 [==============================] - 1s 3ms/step
Out[111]:
array([[ True],
       [False],
       [False],
       ...,
       [False],
       [ True],
       [False]])
In [112]:
y_valid_pred = model.predict(X_valid)
y_valid_pred = (y_valid_pred > 0.5)
y_valid_pred
50/50 [==============================] - 0s 2ms/step
Out[112]:
array([[False],
       [False],
       [False],
       ...,
       [False],
       [ True],
       [False]])
In [113]:
cl_rp = classification_report(y_train, y_train_pred)
print(cl_rp)
              precision    recall  f1-score   support

           0       1.00      0.99      0.99      5096
           1       0.95      1.00      0.97      1304

    accuracy                           0.99      6400
   macro avg       0.97      0.99      0.98      6400
weighted avg       0.99      0.99      0.99      6400

In [114]:
cl_rp = classification_report(y_valid, y_valid_pred)
print(cl_rp)
              precision    recall  f1-score   support

           0       0.87      0.77      0.82      1274
           1       0.38      0.53      0.44       326

    accuracy                           0.72      1600
   macro avg       0.62      0.65      0.63      1600
weighted avg       0.77      0.72      0.74      1600

In [115]:
modelName="NN Adam"
train_data.loc[modelName] = recall_score(y_train, y_train_pred)
valid_data.loc[modelName] = recall_score(y_valid, y_valid_pred)
In [116]:
make_confusion_matrix(y_train, y_train_pred)
In [117]:
make_confusion_matrix(y_valid, y_valid_pred)

NN Adam improved performance on both validation and training set compared to NN SGD.

Neural Network with Adam Optimizer and Dropout¶

In [118]:
backend.clear_session()
np.random.seed(2)
random.seed(2)
tf.random.set_seed(2)
In [119]:
#Initializing the neural network
model1 = Sequential()
model1.add(Dense(35,activation="relu",input_dim=X_train.shape[1]))
model1.add(Dropout(0.4))
model1.add(Dense(7,activation="relu"))
model1.add(Dropout(0.2))
model1.add(Dense(1,activation="sigmoid"))
In [120]:
model1.summary()
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense (Dense)               (None, 35)                103040    
                                                                 
 dropout (Dropout)           (None, 35)                0         
                                                                 
 dense_1 (Dense)             (None, 7)                 252       
                                                                 
 dropout_1 (Dropout)         (None, 7)                 0         
                                                                 
 dense_2 (Dense)             (None, 1)                 8         
                                                                 
=================================================================
Total params: 103300 (403.52 KB)
Trainable params: 103300 (403.52 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
In [121]:
optimizer = tf.keras.optimizers.Adam()    # defining SGD as the optimizer to be used
metric = keras.metrics.Recall()
model1.compile(loss='binary_crossentropy', optimizer=optimizer,metrics = ['Recall'])
In [122]:
start = time.time()
history = model1.fit(X_train, y_train, validation_data=(X_valid,y_valid) , batch_size=batch_size, epochs=epochs,class_weight=cw_dict)
end=time.time()
Epoch 1/25
99/99 [==============================] - 4s 7ms/step - loss: 1.8455 - recall: 0.5897 - val_loss: 0.6725 - val_recall: 0.2791
Epoch 2/25
99/99 [==============================] - 0s 5ms/step - loss: 1.4818 - recall: 0.4701 - val_loss: 0.6574 - val_recall: 0.2822
Epoch 3/25
99/99 [==============================] - 0s 5ms/step - loss: 1.3689 - recall: 0.5215 - val_loss: 0.6477 - val_recall: 0.3129
Epoch 4/25
99/99 [==============================] - 1s 6ms/step - loss: 1.2457 - recall: 0.6495 - val_loss: 0.6402 - val_recall: 0.4080
Epoch 5/25
99/99 [==============================] - 1s 5ms/step - loss: 1.1704 - recall: 0.6879 - val_loss: 0.6253 - val_recall: 0.4356
Epoch 6/25
99/99 [==============================] - 0s 5ms/step - loss: 1.1206 - recall: 0.7025 - val_loss: 0.6259 - val_recall: 0.5000
Epoch 7/25
99/99 [==============================] - 1s 8ms/step - loss: 1.0504 - recall: 0.7393 - val_loss: 0.6225 - val_recall: 0.5337
Epoch 8/25
99/99 [==============================] - 1s 7ms/step - loss: 0.9739 - recall: 0.7822 - val_loss: 0.6207 - val_recall: 0.5337
Epoch 9/25
99/99 [==============================] - 1s 8ms/step - loss: 0.9189 - recall: 0.7983 - val_loss: 0.6144 - val_recall: 0.5276
Epoch 10/25
99/99 [==============================] - 1s 8ms/step - loss: 0.8895 - recall: 0.7899 - val_loss: 0.6419 - val_recall: 0.5583
Epoch 11/25
99/99 [==============================] - 0s 5ms/step - loss: 0.8411 - recall: 0.8052 - val_loss: 0.6529 - val_recall: 0.5552
Epoch 12/25
99/99 [==============================] - 0s 5ms/step - loss: 0.8032 - recall: 0.8198 - val_loss: 0.6572 - val_recall: 0.5429
Epoch 13/25
99/99 [==============================] - 1s 5ms/step - loss: 0.7788 - recall: 0.8367 - val_loss: 0.6661 - val_recall: 0.5491
Epoch 14/25
99/99 [==============================] - 0s 5ms/step - loss: 0.7505 - recall: 0.8489 - val_loss: 0.6761 - val_recall: 0.5460
Epoch 15/25
99/99 [==============================] - 0s 5ms/step - loss: 0.7223 - recall: 0.8551 - val_loss: 0.6907 - val_recall: 0.5552
Epoch 16/25
99/99 [==============================] - 0s 5ms/step - loss: 0.7059 - recall: 0.8428 - val_loss: 0.7134 - val_recall: 0.5706
Epoch 17/25
99/99 [==============================] - 0s 5ms/step - loss: 0.6908 - recall: 0.8520 - val_loss: 0.7174 - val_recall: 0.5460
Epoch 18/25
99/99 [==============================] - 0s 5ms/step - loss: 0.6789 - recall: 0.8512 - val_loss: 0.7540 - val_recall: 0.5613
Epoch 19/25
99/99 [==============================] - 1s 6ms/step - loss: 0.6697 - recall: 0.8466 - val_loss: 0.7699 - val_recall: 0.5644
Epoch 20/25
99/99 [==============================] - 1s 6ms/step - loss: 0.6577 - recall: 0.8489 - val_loss: 0.7893 - val_recall: 0.5675
Epoch 21/25
99/99 [==============================] - 0s 5ms/step - loss: 0.6365 - recall: 0.8597 - val_loss: 0.8050 - val_recall: 0.5675
Epoch 22/25
99/99 [==============================] - 1s 5ms/step - loss: 0.6297 - recall: 0.8689 - val_loss: 0.8064 - val_recall: 0.5644
Epoch 23/25
99/99 [==============================] - 0s 5ms/step - loss: 0.6162 - recall: 0.8543 - val_loss: 0.8462 - val_recall: 0.5644
Epoch 24/25
99/99 [==============================] - 1s 5ms/step - loss: 0.6175 - recall: 0.8673 - val_loss: 0.8495 - val_recall: 0.5644
Epoch 25/25
99/99 [==============================] - 0s 5ms/step - loss: 0.6057 - recall: 0.8673 - val_loss: 0.8617 - val_recall: 0.5675
In [123]:
print("Time taken in seconds ",end-start)
Time taken in seconds  17.52609157562256
In [124]:
plot(history,'loss')
In [125]:
y_train_pred = model1.predict(X_train)
y_train_pred = (y_train_pred > 0.5)
y_train_pred
200/200 [==============================] - 0s 2ms/step
Out[125]:
array([[ True],
       [False],
       [False],
       ...,
       [False],
       [ True],
       [ True]])
In [126]:
y_valid_pred = model1.predict(X_valid)
y_valid_pred = (y_valid_pred > 0.5)
y_valid_pred
50/50 [==============================] - 0s 2ms/step
Out[126]:
array([[False],
       [False],
       [False],
       ...,
       [False],
       [ True],
       [ True]])
In [127]:
cl_rp = classification_report(y_train, y_train_pred)
print(cl_rp)
              precision    recall  f1-score   support

           0       0.98      0.86      0.92      5096
           1       0.63      0.92      0.75      1304

    accuracy                           0.87      6400
   macro avg       0.80      0.89      0.83      6400
weighted avg       0.91      0.87      0.88      6400

In [128]:
cl_rp = classification_report(y_valid, y_valid_pred)
print(cl_rp)
              precision    recall  f1-score   support

           0       0.87      0.71      0.78      1274
           1       0.33      0.57      0.42       326

    accuracy                           0.68      1600
   macro avg       0.60      0.64      0.60      1600
weighted avg       0.76      0.68      0.71      1600

In [129]:
modelName="NN with Adam and dropout"
train_data.loc[modelName] = recall_score(y_train, y_train_pred)
valid_data.loc[modelName] = recall_score(y_valid, y_valid_pred)
In [130]:
make_confusion_matrix(y_train, y_train_pred)
In [131]:
make_confusion_matrix(y_valid, y_valid_pred)

NN with Adam and dropout seemed to decrease the performance for the training set but increased the performance for the validation set. This is a better model compared to the others so far.

Neural Network with Balanced Data (by applying SMOTE) and SGD Optimizer¶

In [132]:
smote = SMOTE(random_state=42)  # Create the SMOTE object
X_train_smote, y_train_smote = smote.fit_resample(X_train, y_train)      # Resample the data
In [133]:
print('After UpSampling, the shape of train_X: {}'.format(X_train_smote.shape))
print('After UpSampling, the shape of train_y: {} \n'.format(y_train_smote.shape))
After UpSampling, the shape of train_X: (10192, 2943)
After UpSampling, the shape of train_y: (10192,) 

In [134]:
backend.clear_session()
np.random.seed(2)
random.seed(2)
tf.random.set_seed(2)
In [135]:
#Initializing the neural network
model2 = Sequential()
model2.add(Dense(70,activation="relu",input_dim=X_train_smote.shape[1]))
model2.add(Dense(17,activation="relu"))
model2.add(Dense(17,activation="relu"))
model2.add(Dense(1,activation="sigmoid"))
In [136]:
model2.summary()
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense (Dense)               (None, 70)                206080    
                                                                 
 dense_1 (Dense)             (None, 17)                1207      
                                                                 
 dense_2 (Dense)             (None, 17)                306       
                                                                 
 dense_3 (Dense)             (None, 1)                 18        
                                                                 
=================================================================
Total params: 207611 (810.98 KB)
Trainable params: 207611 (810.98 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
In [140]:
optimizer = tf.keras.optimizers.SGD(0.001)    # defining SGD as the optimizer to be used
metric = keras.metrics.Recall()
model2.compile(loss='binary_crossentropy', optimizer=optimizer,metrics = ['Recall'])
In [141]:
start = time.time()
history = model2.fit(X_train_smote, y_train_smote, validation_data=(X_valid,y_valid) , batch_size=batch_size, epochs=epochs,class_weight=cw_dict)
end=time.time()
Epoch 1/25
157/157 [==============================] - 2s 7ms/step - loss: 1.8193 - recall: 0.9541 - val_loss: 0.9011 - val_recall: 0.9847
Epoch 2/25
157/157 [==============================] - 1s 5ms/step - loss: 1.5489 - recall: 0.9984 - val_loss: 0.9720 - val_recall: 0.9939
Epoch 3/25
157/157 [==============================] - 1s 5ms/step - loss: 1.4393 - recall: 0.9992 - val_loss: 0.9848 - val_recall: 0.9939
Epoch 4/25
157/157 [==============================] - 1s 5ms/step - loss: 1.3738 - recall: 0.9994 - val_loss: 0.9814 - val_recall: 0.9908
Epoch 5/25
157/157 [==============================] - 1s 4ms/step - loss: 1.3222 - recall: 0.9992 - val_loss: 0.9716 - val_recall: 0.9908
Epoch 6/25
157/157 [==============================] - 1s 5ms/step - loss: 1.2779 - recall: 0.9984 - val_loss: 0.9586 - val_recall: 0.9847
Epoch 7/25
157/157 [==============================] - 1s 5ms/step - loss: 1.2378 - recall: 0.9978 - val_loss: 0.9481 - val_recall: 0.9816
Epoch 8/25
157/157 [==============================] - 1s 5ms/step - loss: 1.2005 - recall: 0.9971 - val_loss: 0.9340 - val_recall: 0.9724
Epoch 9/25
157/157 [==============================] - 1s 5ms/step - loss: 1.1650 - recall: 0.9961 - val_loss: 0.9282 - val_recall: 0.9663
Epoch 10/25
157/157 [==============================] - 1s 5ms/step - loss: 1.1307 - recall: 0.9955 - val_loss: 0.9187 - val_recall: 0.9571
Epoch 11/25
157/157 [==============================] - 1s 5ms/step - loss: 1.0974 - recall: 0.9941 - val_loss: 0.9095 - val_recall: 0.9479
Epoch 12/25
157/157 [==============================] - 1s 6ms/step - loss: 1.0647 - recall: 0.9922 - val_loss: 0.8959 - val_recall: 0.9202
Epoch 13/25
157/157 [==============================] - 1s 8ms/step - loss: 1.0328 - recall: 0.9896 - val_loss: 0.8892 - val_recall: 0.8834
Epoch 14/25
157/157 [==============================] - 1s 8ms/step - loss: 1.0014 - recall: 0.9869 - val_loss: 0.8786 - val_recall: 0.8374
Epoch 15/25
157/157 [==============================] - 1s 5ms/step - loss: 0.9705 - recall: 0.9843 - val_loss: 0.8706 - val_recall: 0.8067
Epoch 16/25
157/157 [==============================] - 1s 5ms/step - loss: 0.9403 - recall: 0.9843 - val_loss: 0.8574 - val_recall: 0.7791
Epoch 17/25
157/157 [==============================] - 1s 5ms/step - loss: 0.9111 - recall: 0.9821 - val_loss: 0.8514 - val_recall: 0.7699
Epoch 18/25
157/157 [==============================] - 1s 5ms/step - loss: 0.8824 - recall: 0.9816 - val_loss: 0.8408 - val_recall: 0.7270
Epoch 19/25
157/157 [==============================] - 1s 5ms/step - loss: 0.8546 - recall: 0.9798 - val_loss: 0.8354 - val_recall: 0.7147
Epoch 20/25
157/157 [==============================] - 1s 5ms/step - loss: 0.8279 - recall: 0.9788 - val_loss: 0.8275 - val_recall: 0.6994
Epoch 21/25
157/157 [==============================] - 1s 5ms/step - loss: 0.8022 - recall: 0.9792 - val_loss: 0.8160 - val_recall: 0.6810
Epoch 22/25
157/157 [==============================] - 1s 7ms/step - loss: 0.7777 - recall: 0.9774 - val_loss: 0.8106 - val_recall: 0.6656
Epoch 23/25
157/157 [==============================] - 1s 9ms/step - loss: 0.7546 - recall: 0.9765 - val_loss: 0.8009 - val_recall: 0.6258
Epoch 24/25
157/157 [==============================] - 2s 10ms/step - loss: 0.7331 - recall: 0.9772 - val_loss: 0.7943 - val_recall: 0.6043
Epoch 25/25
157/157 [==============================] - 2s 11ms/step - loss: 0.7129 - recall: 0.9761 - val_loss: 0.7885 - val_recall: 0.5951
In [142]:
print("Time taken in seconds ",end-start)
Time taken in seconds  42.71089267730713
In [143]:
plot(history,'loss')
In [144]:
y_train_pred = model2.predict(X_train_smote)
y_train_pred = (y_train_pred > 0.5)
y_train_pred
319/319 [==============================] - 1s 2ms/step
Out[144]:
array([[ True],
       [False],
       [False],
       ...,
       [ True],
       [ True],
       [ True]])
In [145]:
y_valid_pred = model2.predict(X_valid)
y_valid_pred = (y_valid_pred > 0.5)
y_valid_pred
50/50 [==============================] - 0s 2ms/step
Out[145]:
array([[False],
       [ True],
       [False],
       ...,
       [ True],
       [ True],
       [ True]])
In [146]:
cl_rp = classification_report(y_train_smote, y_train_pred)
print(cl_rp)
              precision    recall  f1-score   support

           0       0.97      0.67      0.79      5096
           1       0.75      0.98      0.85      5096

    accuracy                           0.82     10192
   macro avg       0.86      0.82      0.82     10192
weighted avg       0.86      0.82      0.82     10192

In [147]:
cl_rp = classification_report(y_valid, y_valid_pred)
print(cl_rp)
              precision    recall  f1-score   support

           0       0.84      0.54      0.66      1274
           1       0.25      0.60      0.35       326

    accuracy                           0.56      1600
   macro avg       0.55      0.57      0.51      1600
weighted avg       0.72      0.56      0.60      1600

In [148]:
modelName="NN with SMOTE and SGD"
train_data.loc[modelName] = recall_score(y_train_smote, y_train_pred)
valid_data.loc[modelName] = recall_score(y_valid, y_valid_pred)
In [149]:
make_confusion_matrix(y_train_smote, y_train_pred)
In [150]:
make_confusion_matrix(y_valid, y_valid_pred)

NN with SMOTE and SGD showed a decrease in recall values and would not be an improvement upon our previous model.

Neural Network with Balanced Data (by applying SMOTE) and Adam Optimizer¶

In [151]:
backend.clear_session()
np.random.seed(2)
random.seed(2)
tf.random.set_seed(2)
In [152]:
#Initializing the neural network
model3 = Sequential()
model3.add(Dense(70,activation="relu",input_dim=X_train_smote.shape[1]))
model3.add(Dense(17,activation="relu"))
model3.add(Dense(17,activation="relu"))
model3.add(Dense(1,activation="sigmoid"))
In [153]:
model3.summary()
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense (Dense)               (None, 70)                206080    
                                                                 
 dense_1 (Dense)             (None, 17)                1207      
                                                                 
 dense_2 (Dense)             (None, 17)                306       
                                                                 
 dense_3 (Dense)             (None, 1)                 18        
                                                                 
=================================================================
Total params: 207611 (810.98 KB)
Trainable params: 207611 (810.98 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
In [156]:
optimizer = tf.keras.optimizers.Adam()    # defining SGD as the optimizer to be used
metric = keras.metrics.Recall()
model3.compile(loss='binary_crossentropy', optimizer=optimizer,metrics = ['Recall'])
In [157]:
start = time.time()
history = model3.fit(X_train_smote, y_train_smote, validation_data=(X_valid,y_valid) , batch_size=batch_size, epochs=epochs,class_weight=cw_dict)
end=time.time()
Epoch 1/25
157/157 [==============================] - 2s 7ms/step - loss: 1.1999 - recall: 0.9700 - val_loss: 0.8121 - val_recall: 0.6043
Epoch 2/25
157/157 [==============================] - 1s 5ms/step - loss: 0.7605 - recall: 0.9557 - val_loss: 0.7599 - val_recall: 0.5644
Epoch 3/25
157/157 [==============================] - 1s 7ms/step - loss: 0.6164 - recall: 0.9696 - val_loss: 0.7111 - val_recall: 0.5123
Epoch 4/25
157/157 [==============================] - 1s 9ms/step - loss: 0.5577 - recall: 0.9717 - val_loss: 0.7791 - val_recall: 0.5491
Epoch 5/25
157/157 [==============================] - 1s 9ms/step - loss: 0.5108 - recall: 0.9788 - val_loss: 0.8028 - val_recall: 0.5890
Epoch 6/25
157/157 [==============================] - 1s 9ms/step - loss: 0.4616 - recall: 0.9825 - val_loss: 0.7680 - val_recall: 0.5184
Epoch 7/25
157/157 [==============================] - 1s 6ms/step - loss: 0.4203 - recall: 0.9851 - val_loss: 0.8012 - val_recall: 0.4908
Epoch 8/25
157/157 [==============================] - 1s 5ms/step - loss: 0.3778 - recall: 0.9857 - val_loss: 0.8295 - val_recall: 0.4939
Epoch 9/25
157/157 [==============================] - 1s 5ms/step - loss: 0.3336 - recall: 0.9884 - val_loss: 0.8698 - val_recall: 0.4724
Epoch 10/25
157/157 [==============================] - 1s 5ms/step - loss: 0.2921 - recall: 0.9908 - val_loss: 0.8712 - val_recall: 0.4110
Epoch 11/25
157/157 [==============================] - 1s 5ms/step - loss: 0.2457 - recall: 0.9925 - val_loss: 0.9552 - val_recall: 0.4233
Epoch 12/25
157/157 [==============================] - 1s 5ms/step - loss: 0.2168 - recall: 0.9929 - val_loss: 0.9588 - val_recall: 0.3650
Epoch 13/25
157/157 [==============================] - 1s 5ms/step - loss: 0.1742 - recall: 0.9965 - val_loss: 1.0398 - val_recall: 0.3620
Epoch 14/25
157/157 [==============================] - 1s 5ms/step - loss: 0.1450 - recall: 0.9973 - val_loss: 1.0610 - val_recall: 0.3190
Epoch 15/25
157/157 [==============================] - 1s 5ms/step - loss: 0.1255 - recall: 0.9967 - val_loss: 1.1189 - val_recall: 0.3190
Epoch 16/25
157/157 [==============================] - 1s 5ms/step - loss: 0.1071 - recall: 0.9969 - val_loss: 1.1895 - val_recall: 0.2883
Epoch 17/25
157/157 [==============================] - 1s 5ms/step - loss: 0.0910 - recall: 0.9978 - val_loss: 1.2393 - val_recall: 0.2914
Epoch 18/25
157/157 [==============================] - 1s 5ms/step - loss: 0.0801 - recall: 0.9974 - val_loss: 1.2890 - val_recall: 0.2914
Epoch 19/25
157/157 [==============================] - 1s 5ms/step - loss: 0.0643 - recall: 0.9978 - val_loss: 1.3613 - val_recall: 0.3129
Epoch 20/25
157/157 [==============================] - 2s 10ms/step - loss: 0.0588 - recall: 0.9992 - val_loss: 1.3809 - val_recall: 0.2791
Epoch 21/25
157/157 [==============================] - 2s 12ms/step - loss: 0.0487 - recall: 0.9994 - val_loss: 1.4757 - val_recall: 0.2883
Epoch 22/25
157/157 [==============================] - 2s 10ms/step - loss: 0.0439 - recall: 0.9988 - val_loss: 1.5160 - val_recall: 0.2822
Epoch 23/25
157/157 [==============================] - 1s 8ms/step - loss: 0.0482 - recall: 0.9988 - val_loss: 1.5498 - val_recall: 0.3067
Epoch 24/25
157/157 [==============================] - 1s 5ms/step - loss: 0.0415 - recall: 0.9994 - val_loss: 1.6068 - val_recall: 0.3067
Epoch 25/25
157/157 [==============================] - 1s 5ms/step - loss: 0.0323 - recall: 0.9996 - val_loss: 1.6510 - val_recall: 0.2669
In [158]:
print("Time taken in seconds ",end-start)
Time taken in seconds  27.022613763809204
In [159]:
plot(history,'loss')
In [160]:
y_train_pred = model3.predict(X_train_smote)
y_train_pred = (y_train_pred > 0.5)
y_train_pred
319/319 [==============================] - 1s 2ms/step
Out[160]:
array([[ True],
       [False],
       [False],
       ...,
       [ True],
       [ True],
       [ True]])
In [161]:
y_valid_pred = model3.predict(X_valid)
y_valid_pred = (y_valid_pred > 0.5)
y_valid_pred
50/50 [==============================] - 0s 2ms/step
Out[161]:
array([[False],
       [False],
       [False],
       ...,
       [False],
       [ True],
       [False]])
In [162]:
cl_rp = classification_report(y_train_smote, y_train_pred)
print(cl_rp)
              precision    recall  f1-score   support

           0       1.00      0.99      1.00      5096
           1       0.99      1.00      1.00      5096

    accuracy                           1.00     10192
   macro avg       1.00      1.00      1.00     10192
weighted avg       1.00      1.00      1.00     10192

In [163]:
cl_rp = classification_report(y_valid, y_valid_pred)
print(cl_rp)
              precision    recall  f1-score   support

           0       0.82      0.88      0.85      1274
           1       0.36      0.27      0.31       326

    accuracy                           0.76      1600
   macro avg       0.59      0.57      0.58      1600
weighted avg       0.73      0.76      0.74      1600

In [164]:
modelName="NN with SMOTE and Adam"
train_data.loc[modelName] = recall_score(y_train_smote, y_train_pred)
valid_data.loc[modelName] = recall_score(y_valid, y_valid_pred)
In [165]:
make_confusion_matrix(y_train_smote, y_train_pred)
In [166]:
make_confusion_matrix(y_valid, y_valid_pred)

NN with SMOTE and Adam gave a recall of 1 for exited customers in the training set but gave only a value of 0.27 for the validation set. This is not an ideal model as it is an overfitted model with a lot of noise.

Neural Network with Balanced Data (by applying SMOTE), Adam Optimizer, and Dropout¶

In [167]:
backend.clear_session()
np.random.seed(2)
random.seed(2)
tf.random.set_seed(2)
In [168]:
#Initializing the neural network
model4 = Sequential()
model4.add(Dense(35,activation="relu",input_dim=X_train_smote.shape[1]))
model4.add(Dropout(0.4))
model4.add(Dense(7,activation="relu"))
model4.add(Dropout(0.2))
model4.add(Dense(1,activation="sigmoid"))
In [169]:
model4.summary()
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense (Dense)               (None, 35)                103040    
                                                                 
 dropout (Dropout)           (None, 35)                0         
                                                                 
 dense_1 (Dense)             (None, 7)                 252       
                                                                 
 dropout_1 (Dropout)         (None, 7)                 0         
                                                                 
 dense_2 (Dense)             (None, 1)                 8         
                                                                 
=================================================================
Total params: 103300 (403.52 KB)
Trainable params: 103300 (403.52 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
In [170]:
optimizer = tf.keras.optimizers.Adam()    # defining SGD as the optimizer to be used
metric = keras.metrics.Recall()
model4.compile(loss='binary_crossentropy', optimizer=optimizer,metrics = ['Recall'])
In [171]:
start = time.time()
history = model4.fit(X_train_smote, y_train_smote, validation_data=(X_valid,y_valid) , batch_size=batch_size, epochs=epochs,class_weight=cw_dict)
end=time.time()
Epoch 1/25
157/157 [==============================] - 2s 7ms/step - loss: 1.4658 - recall: 0.9700 - val_loss: 0.8654 - val_recall: 0.8344
Epoch 2/25
157/157 [==============================] - 1s 5ms/step - loss: 1.0600 - recall: 0.9700 - val_loss: 0.7939 - val_recall: 0.6656
Epoch 3/25
157/157 [==============================] - 1s 7ms/step - loss: 0.8777 - recall: 0.9625 - val_loss: 0.7825 - val_recall: 0.6043
Epoch 4/25
157/157 [==============================] - 1s 8ms/step - loss: 0.7886 - recall: 0.9609 - val_loss: 0.7979 - val_recall: 0.5828
Epoch 5/25
157/157 [==============================] - 1s 6ms/step - loss: 0.7466 - recall: 0.9584 - val_loss: 0.8239 - val_recall: 0.5798
Epoch 6/25
157/157 [==============================] - 1s 5ms/step - loss: 0.6920 - recall: 0.9619 - val_loss: 0.7807 - val_recall: 0.5583
Epoch 7/25
157/157 [==============================] - 1s 5ms/step - loss: 0.6774 - recall: 0.9657 - val_loss: 0.7815 - val_recall: 0.5521
Epoch 8/25
157/157 [==============================] - 1s 5ms/step - loss: 0.6476 - recall: 0.9641 - val_loss: 0.7802 - val_recall: 0.5460
Epoch 9/25
157/157 [==============================] - 1s 5ms/step - loss: 0.6245 - recall: 0.9659 - val_loss: 0.8246 - val_recall: 0.5736
Epoch 10/25
157/157 [==============================] - 1s 5ms/step - loss: 0.6299 - recall: 0.9674 - val_loss: 0.7966 - val_recall: 0.5767
Epoch 11/25
157/157 [==============================] - 1s 5ms/step - loss: 0.6060 - recall: 0.9721 - val_loss: 0.8023 - val_recall: 0.5675
Epoch 12/25
157/157 [==============================] - 1s 5ms/step - loss: 0.5739 - recall: 0.9729 - val_loss: 0.8054 - val_recall: 0.5736
Epoch 13/25
157/157 [==============================] - 1s 5ms/step - loss: 0.5694 - recall: 0.9719 - val_loss: 0.8285 - val_recall: 0.5828
Epoch 14/25
157/157 [==============================] - 1s 5ms/step - loss: 0.5585 - recall: 0.9725 - val_loss: 0.8253 - val_recall: 0.5859
Epoch 15/25
157/157 [==============================] - 1s 5ms/step - loss: 0.5630 - recall: 0.9712 - val_loss: 0.8167 - val_recall: 0.5951
Epoch 16/25
157/157 [==============================] - 1s 5ms/step - loss: 0.5447 - recall: 0.9753 - val_loss: 0.8226 - val_recall: 0.5951
Epoch 17/25
157/157 [==============================] - 1s 5ms/step - loss: 0.5330 - recall: 0.9765 - val_loss: 0.8409 - val_recall: 0.6012
Epoch 18/25
157/157 [==============================] - 1s 6ms/step - loss: 0.5209 - recall: 0.9770 - val_loss: 0.8380 - val_recall: 0.5798
Epoch 19/25
157/157 [==============================] - 1s 7ms/step - loss: 0.5237 - recall: 0.9759 - val_loss: 0.8490 - val_recall: 0.5828
Epoch 20/25
157/157 [==============================] - 1s 8ms/step - loss: 0.5082 - recall: 0.9763 - val_loss: 0.8733 - val_recall: 0.5982
Epoch 21/25
157/157 [==============================] - 1s 5ms/step - loss: 0.4849 - recall: 0.9810 - val_loss: 0.8725 - val_recall: 0.5798
Epoch 22/25
157/157 [==============================] - 1s 5ms/step - loss: 0.4727 - recall: 0.9784 - val_loss: 0.8953 - val_recall: 0.5675
Epoch 23/25
157/157 [==============================] - 1s 5ms/step - loss: 0.4539 - recall: 0.9806 - val_loss: 0.8978 - val_recall: 0.5767
Epoch 24/25
157/157 [==============================] - 1s 5ms/step - loss: 0.4542 - recall: 0.9802 - val_loss: 0.8677 - val_recall: 0.5521
Epoch 25/25
157/157 [==============================] - 1s 5ms/step - loss: 0.4385 - recall: 0.9796 - val_loss: 0.8726 - val_recall: 0.5337
In [172]:
print("Time taken in seconds ",end-start)
Time taken in seconds  22.912023544311523
In [173]:
plot(history,'loss')
In [174]:
y_train_pred = model4.predict(X_train_smote)
y_train_pred = (y_train_pred > 0.5)
y_train_pred
319/319 [==============================] - 1s 2ms/step
Out[174]:
array([[ True],
       [False],
       [False],
       ...,
       [ True],
       [ True],
       [ True]])
In [175]:
y_valid_pred = model4.predict(X_valid)
y_valid_pred = (y_valid_pred > 0.5)
y_valid_pred
50/50 [==============================] - 0s 2ms/step
Out[175]:
array([[False],
       [False],
       [False],
       ...,
       [False],
       [ True],
       [ True]])
In [176]:
cl_rp = classification_report(y_train_smote, y_train_pred)
print(cl_rp)
              precision    recall  f1-score   support

           0       0.98      0.89      0.93      5096
           1       0.90      0.99      0.94      5096

    accuracy                           0.94     10192
   macro avg       0.94      0.94      0.94     10192
weighted avg       0.94      0.94      0.94     10192

In [177]:
cl_rp = classification_report(y_valid, y_valid_pred)
print(cl_rp)
              precision    recall  f1-score   support

           0       0.86      0.75      0.80      1274
           1       0.35      0.53      0.43       326

    accuracy                           0.71      1600
   macro avg       0.61      0.64      0.61      1600
weighted avg       0.76      0.71      0.73      1600

In [178]:
modelName="NN with SMOTE, Adam and dropout"
train_data.loc[modelName] = recall_score(y_train_smote, y_train_pred)
valid_data.loc[modelName] = recall_score(y_valid, y_valid_pred)
In [179]:
make_confusion_matrix(y_train_smote, y_train_pred)
In [180]:
make_confusion_matrix(y_valid, y_valid_pred)

NN with SMOTE, Adam and dropout has a low precision but a reasonably high recall for training. However there is a significant ddifference in performance of the validation set.

Model Performance Comparison and Final Model Selection¶

In [181]:
print("Training data")
train_data
Training data
Out[181]:
recall
NN SGD 0.815951
NN Adam 0.999233
NN with Adam and dropout 0.915644
NN with SMOTE and SGD 0.978807
NN with SMOTE and Adam 0.999608
NN with SMOTE, Adam and dropout 0.985871
In [182]:
print("validating data")
valid_data
validating data
Out[182]:
recall
NN SGD 0.386503
NN Adam 0.533742
NN with Adam and dropout 0.567485
NN with SMOTE and SGD 0.595092
NN with SMOTE and Adam 0.266871
NN with SMOTE, Adam and dropout 0.533742
In [183]:
diff = train_data-valid_data
diff
Out[183]:
recall
NN SGD 0.429448
NN Adam 0.465491
NN with Adam and dropout 0.348160
NN with SMOTE and SGD 0.383715
NN with SMOTE and Adam 0.732736
NN with SMOTE, Adam and dropout 0.452129

NN with SMOTE and Adam performs the best on training data but very poorly on the validation set. NN SGD has less difference but performance wasn't the best. NN Adam had a 0.99 on training set but the validation set wasn't great and had a big drop from training set to validation set.

NN with Adam and dropout has a great value on training set and a reasonably good value on the validation set too. It also has the least difference between the performances for the training set and validation set. Therefore, I chose NN with Adam and dropout as my final model.

In [196]:
y_test_pred = model4.predict(X_test)
y_test_pred = (y_test_pred > 0.5)
print(y_test_pred)
63/63 [==============================] - 0s 4ms/step
[[False]
 [ True]
 [False]
 ...
 [ True]
 [False]
 [False]]
In [197]:
make_confusion_matrix(y_test,y_test_pred)
In [198]:
cl_rp = classification_report(y_test, y_test_pred)
print(cl_rp)
              precision    recall  f1-score   support

           0       0.86      0.75      0.80      1593
           1       0.35      0.53      0.42       407

    accuracy                           0.70      2000
   macro avg       0.60      0.64      0.61      2000
weighted avg       0.76      0.70      0.72      2000

Actionable Insights and Business Recommendations¶

Observations

  • Age has a positive correlance with exited. Over 75% of the customers who did not exit were 45 years or younger.
  • IsActiveMember has a negative correlance to exited. Those who were active were less likely to leave.
  • Customers based in France make up over 50% of the customers but customers based in Germany had the largest group of exited customer base.

Business Recommendations

  • Target younger customers to join as they are more likely to stay.

  • Encourage customers to stay active with the help of monthly deals and coupons.

  • Provide offers like discounts to luxury brands and free subscriptions to movie apps, music apps, etc., through brand partnerships to incentivize customers to stay.

  • Target customers located in France.

  • Since Germany has a higher percentage of people leaving, identify what may have caused this and try to prevent more german customers from leaving.

Power Ahead